Goto

Collaborating Authors

 virtual agent


HealthDial: A No-Code LLM-Assisted Dialogue Authoring Tool for Healthcare Virtual Agents

Nouraei, Farnaz, Yong, Zhuorui, Bickmore, Timothy

arXiv.org Artificial Intelligence

We introduce HealthDial, a dialogue authoring tool that helps healthcare providers and educators create virtual agents that deliver health education and counseling to patients over multiple conversations. HealthDial leverages large language models (LLMs) to automatically create an initial session-based plan and conversations for each session using text-based patient health education materials as input. Authored dialogue is output in the form of finite state machines for virtual agent delivery so that all content can be validated and no unsafe advice is provided resulting from LLM hallucinations. LLM-drafted dialogue structure and language can be edited by the author in a no-code user interface to ensure validity and optimize clarity and impact. We conducted a feasibility and usability study with counselors and students to test our approach with an authoring task for cancer screening education. Participants used HealthDial and then tested their resulting dialogue by interacting with a 3D-animated virtual agent delivering the dialogue. Through participants' evaluations of the task experience and final dialogues, we show that HealthDial provides a promising first step for counselors to ensure full coverage of their health education materials, while creating understandable and actionable virtual agent dialogue with patients.


GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps

Tan, Yan Rui, Liu, Wenqi, Leong, Wai Lun, Tan, John Guan Zhong, Yong, Wayne Wen Huei, Shi, Fan, Teo, Rodney Swee Huat

arXiv.org Artificial Intelligence

Abstract-- Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo-3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones--six physical and three virtual-- in a forest environment. I. INTRODUCTION Flocking behavior, commonly observed in nature, involves the collective and coordinated movement of groups, such as flocks of birds or schools of fish.


React to This (RTT): A Nonverbal Turing Test for Embodied AI

Zhang, Chuxuan, Etesam, Yasaman, Lim, Angelica

arXiv.org Artificial Intelligence

We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment. In 1950, Turing [1] proposed the "imitation game" as a way to address the question: "Can machines think?" Since then, numerous attempts have been made to pass this test [2]. One of the earliest systems to highlight how surface-level language mimicry could deceive users was ELIZA [3], developed in 1965.


An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments

Korkiakoski, Mikko, Sheikhi, Saeid, Nyman, Jesper, Saariniemi, Jussi, Tapio, Kalle, Kostakos, Panos

arXiv.org Artificial Intelligence

Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.


Embodied AI Agents: Modeling the World

Fung, Pascale, Bachrach, Yoram, Celikyilmaz, Asli, Chaudhuri, Kamalika, Chen, Delong, Chung, Willy, Dupoux, Emmanuel, Gong, Hongyu, Jégou, Hervé, Lazaric, Alessandro, Majumdar, Arjun, Madotto, Andrea, Meier, Franziska, Metze, Florian, Morency, Louis-Philippe, Moutakanni, Théo, Pino, Juan, Terver, Basile, Tighe, Joseph, Tomasello, Paden, Malik, Jitendra

arXiv.org Artificial Intelligence

This paper describes our research on AI agents embodied in visual, virtual or physical forms, enabling them to interact with both users and their environments. These agents, which include virtual avatars, wearable devices, and robots, are designed to perceive, learn and act within their surroundings, which makes them more similar to how humans learn and interact with the environments as compared to disembodied agents. We propose that the development of world models is central to reasoning and planning of embodied AI agents, allowing these agents to understand and predict their environment, to understand user intentions and social contexts, thereby enhancing their ability to perform complex tasks autonomously. World modeling encompasses the integration of multimodal perception, planning through reasoning for action and control, and memory to create a comprehensive understanding of the physical world. Beyond the physical world, we also propose to learn the mental world model of users to enable better human-agent collaboration.


Are We Generalizing from the Exception? An In-the-Wild Study on Group-Sensitive Conversation Design in Human-Agent Interactions

Müller, Ana, Jeschke, Sabina, Richert, Anja

arXiv.org Artificial Intelligence

Are We Generalizing from the Exception? Abstract -- This paper investigates the impact of a group-adaptive conversation design in two socially interactive agents (SIAs) through two real-world studies. Both SIAs - Furhat, a social robot, and MetaHuman, a virtual agent - were equipped with a conversational artificial intelligence (CAI) backend combining hybrid retrieval and generative models. The studies were carried out in an in-the-wild setting with a total of N = 188 participants who interacted with the SIAs - in dyads, triads or larger groups - at a German museum. Although the results did not reveal a significant effect of the group-sensitive conversation design on perceived satisfaction, the findings provide valuable insights into the challenges of adapting CAI for multi-party interactions and across different embodiments (robot vs. virtual agent) highlighting the need for multimodal strategies beyond linguistic pluralization. These insights contribute to the fields of Human-Agent Interaction (HAI), Human-Robot Interaction (HRI), and broader Human-Machine Interaction (HMI), providing insights for future research on effective dialogue adaptation in group settings. Conversational artificial intelligence (CAI) is the core technology that enables socially interactive agents (SIAs) to understand and generate human language. These agents - including social robots, chatbots, and virtual agents - rely on multimodal signals (e.g., text, speech) to engage in naturalistic interactions with humans [1].


Likable or Intelligent? Comparing Social Robots and Virtual Agents for Long-term Health Monitoring

Neef, Caterina, Richert, Anja

arXiv.org Artificial Intelligence

Using social robots and virtual agents (VAs) as interfaces for health monitoring systems for older adults offers the possibility of more engaging interactions that can support long-term health and well-being. While robots are characterized by their physical presence, software-based VAs are more scalable and flexible. Few comparisons of these interfaces exist in the human-robot and human-agent interaction domains, especially in long-term and real-world studies. In this work, we examined impressions of social robots and VAs at the beginning and end of an eight-week study in which older adults interacted with these systems independently in their homes. Using a between-subjects design, participants could choose which interface to evaluate during the study. While participants perceived the social robot as somewhat more likable, the VA was perceived as more intelligent. Our work provides a basis for further studies investigating factors most relevant for engaging interactions with social interfaces for long-term health monitoring.


CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech

Pahar, Madhurananda, Tao, Fuxiang, Mirheidari, Bahman, Pevy, Nathan, Bright, Rebecca, Gadgil, Swapnil, Sproson, Lise, Braun, Dorota, Illingworth, Caitlin, Blackburn, Daniel, Christensen, Heidi

arXiv.org Artificial Intelligence

The early signs of cognitive decline are often noticeable in conversational speech, and identifying those signs is crucial in dealing with later and more serious stages of neurodegenerative diseases. Clinical detection is costly and time-consuming and although there has been recent progress in the automatic detection of speech-based cues, those systems are trained on relatively small databases, lacking detailed metadata and demographic information. This paper presents CognoSpeak and its associated data collection efforts. CognoSpeak asks memory-probing long and short-term questions and administers standard cognitive tasks such as verbal and semantic fluency and picture description using a virtual agent on a mobile or web platform. In addition, it collects multimodal data such as audio and video along with a rich set of metadata from primary and secondary care, memory clinics and remote settings like people's homes. Here, we present results from 126 subjects whose audio was manually transcribed. Several classic classifiers, as well as large language model-based classifiers, have been investigated and evaluated across the different types of prompts. We demonstrate a high level of performance; in particular, we achieved an F1-score of 0.873 using a DistilBERT model to discriminate people with cognitive impairment (dementia and people with mild cognitive impairment (MCI)) from healthy volunteers using the memory responses, fluency tasks and cookie theft picture description. CognoSpeak is an automatic, remote, low-cost, repeatable, non-invasive and less stressful alternative to existing clinical cognitive assessments.


Virtual Agent-Based Communication Skills Training to Facilitate Health Persuasion Among Peers

Nouraei, Farnaz, Rebello, Keith, Fallah, Mina, Murali, Prasanth, Matuszak, Haley, Jap, Valerie, Parker, Andrea, Paasche-Orlow, Michael, Bickmore, Timothy

arXiv.org Artificial Intelligence

Many laypeople are motivated to improve the health behavior of their family or friends but do not know where to start, especially if the health behavior is potentially stigmatizing or controversial. We present an approach that uses virtual agents to coach community-based volunteers in health counseling techniques, such as motivational interviewing, and allows them to practice these skills in role-playing scenarios. We use this approach in a virtual agent-based system to increase COVID-19 vaccination by empowering users to influence their social network. In a between-subjects comparative design study, we test the effects of agent system interactivity and role-playing functionality on counseling outcomes, with participants evaluated by standardized patients and objective judges. We find that all versions are effective at producing peer counselors who score adequately on a standardized measure of counseling competence, and that participants were significantly more satisfied with interactive virtual agents compared to passive viewing of the training material. We discuss design implications for interpersonal skills training systems based on our findings.


The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent

Kroczek, Leon O. H., May, Alexander, Hettenkofer, Selina, Ruider, Andreas, Ludwig, Bernd, Mühlberger, Andreas

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in conversational tasks. Embodying an LLM as a virtual human allows users to engage in face-to-face social interactions in Virtual Reality. However, the influence of person- and task-related factors in social interactions with LLM-controlled agents remains unclear. In this study, forty-six participants interacted with a virtual agent whose persona was manipulated as extravert or introvert in three different conversational tasks (small talk, knowledge test, convincing). Social-evaluation, emotional experience, and realism were assessed using ratings. Interactive engagement was measured by quantifying participants' words and conversational turns. Finally, we measured participants' willingness to ask the agent for help during the knowledge test. Our findings show that the extraverted agent was more positively evaluated, elicited a more pleasant experience and greater engagement, and was assessed as more realistic compared to the introverted agent. Whereas persona did not affect the tendency to ask for help, participants were generally more confident in the answer when they had help of the LLM. Variation of personality traits of LLM-controlled embodied virtual agents, therefore, affects social-emotional processing and behavior in virtual interactions. Embodied virtual agents allow the presentation of naturalistic social encounters in a virtual environment.